Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix lexing of regex at the start of a if/loop #2529

Merged
merged 1 commit into from
Jul 10, 2024

Conversation

bobrippling
Copy link
Contributor

This only really appears in code (espruino/BangleApps#3498) which is minified to something like:

var b=E.getPowerUsage(),
  c=E.getBattery(),
  a=0;

for(var d in b.device)
  /^(LCD|LED)/.test(d) || (a += b.device[d]);

... resulting in:

Uncaught SyntaxError: Got '^' expected EOF
 at line 1 col 109 in widbattpwr.wid.js
...attery(),a=0;for(var d in b.device)/^(LCD|LED)/.test(d)||(a+=b.device[...
                                        ^
 at line 1 col 110 in widbattpwr.wid.js
...ttery(),a=0;for(var d in b.device)/^(LCD|LED)/.test(d)||(a+=b.device[d...
                                        ^

So while the code runs fine normally, installing it via the proper route minifies it and leads to it being unable to run.


This is the case for both if statements and loops:

>for(;0;)/a/;
Uncaught SyntaxError: Got ';' expected EOF
 at line 1 col 9
for(;0;)/a/;
        ^
 at line 1 col 12
for(;0;)/a/;
           ^
>for(x in [])/a/;
Uncaught SyntaxError: Got ';' expected EOF
 at line 1 col 13
for(x in [])/a/;
            ^
 at line 1 col 16
for(x in [])/a/;
               ^

while & if statements:

>while(0)/a/;
Uncaught SyntaxError: Got ';' expected EOF
 at line 1 col 9
while(0)/a/;
        ^
 at line 1 col 12
while(0)/a/;
           ^
>if(0)/a/;
Uncaught SyntaxError: Got '/' expected EOF
 at line 1 col 6
if(0)/a/;
     ^
>if(0)3;
=undefined

The issue is that we don't lex regex immediately after a close-paren, so the fix is quite straightforward.

This only really appears in code (espruino/BangleApps#3498) which is minified to something like:
```javascript
var b=E.getPowerUsage(),
  c=E.getBattery(),
  a=0;

for(var d in b.device)
  /^(LCD|LED)/.test(d) || (a += b.device[d]);
```

... resulting in:

```
Uncaught SyntaxError: Got '^' expected EOF
 at line 1 col 109 in widbattpwr.wid.js
...attery(),a=0;for(var d in b.device)/^(LCD|LED)/.test(d)||(a+=b.device[...
                                        ^
 at line 1 col 110 in widbattpwr.wid.js
...ttery(),a=0;for(var d in b.device)/^(LCD|LED)/.test(d)||(a+=b.device[d...
                                        ^
```

So while the code runs fine normally, installing it via the proper route minifies it and leads to it being unable to run.

---

This is the case for both `if` statements and loops:

```javascript
>for(;0;)/a/;
Uncaught SyntaxError: Got ';' expected EOF
 at line 1 col 9
for(;0;)/a/;
        ^
 at line 1 col 12
for(;0;)/a/;
           ^
>for(x in [])/a/;
Uncaught SyntaxError: Got ';' expected EOF
 at line 1 col 13
for(x in [])/a/;
            ^
 at line 1 col 16
for(x in [])/a/;
               ^
```

while & if statements:

```javascript
>while(0)/a/;
Uncaught SyntaxError: Got ';' expected EOF
 at line 1 col 9
while(0)/a/;
        ^
 at line 1 col 12
while(0)/a/;
           ^
>if(0)/a/;
Uncaught SyntaxError: Got '/' expected EOF
 at line 1 col 6
if(0)/a/;
     ^
>if(0)3;
=undefined
```

The issue is that we don't lex regex immediately after a close-paren, so
the fix is quite straightforward.
@gfwilliams
Copy link
Member

Excellent - thanks for tracking this down! Nice easy fix, but potentially a hard one to find :)

I'll update the parser in Espruinotools too

@gfwilliams gfwilliams merged commit 28b41fd into espruino:master Jul 10, 2024
19 checks passed
gfwilliams added a commit to espruino/EspruinoTools that referenced this pull request Jul 10, 2024
gfwilliams added a commit to espruino/EspruinoWebIDE that referenced this pull request Jul 10, 2024
@bobrippling
Copy link
Contributor Author

Thank you!

@bobrippling bobrippling deleted the fix/re-statement-syntax branch July 10, 2024 11:34
@thyttan
Copy link
Contributor

thyttan commented Jul 10, 2024

@gfwilliams @bobrippling

I suspect this has broken something.

The clock face doesn't show - it's just blank. Clicking the hw button fastloads the launcher (dtlaunch).

Fastloading dtlaunch from clock:

SyntaxError: SyntaxError: Got REGEX expected ','
SyntaxError: SyntaxError: Got REGEX expected ','

Fastloading antonclk from dtlaunch:

SyntaxError: SyntaxError: Got REGEX expected ','
Uncaught Error: Function "time" not found!
 at line 2 col 4590 in antonclk.app.js
...var timeStr=require("locale").time(date,1);g.setFontAlign(0,0)....
                                 ^
in function "draw" called from line 3 col 383 in antonclk.app.js
...;Bangle.loadWidgets();draw();setTimeout(Bangle.drawWidgets,0...
                              ^

Long press to to reset to clock face antonclk (from antonclk):

widmessages.wid.js SyntaxError: SyntaxError: Got EOF expected '}'
widminbate.wid.js SyntaxError: SyntaxError: Got EOF expected '}'
Uncaught SyntaxError: Got ',' expected EOF
 at line 2 col 138
...econds()).substr(-2)+" "+b},dow:(a,b)=>"Sunday Monday Tuesda...
                              ^
at line 2 col 4601 in antonclk.app.js
...quire("locale").time(date,1);g.setFontAlign(0,0).setFont("Anton"...
                              ^
in function "draw" called from line 3 col 383 in antonclk.app.js
...;Bangle.loadWidgets();draw();setTimeout(Bangle.drawWidgets,0...
                              ^
> 

EDIT:

The same operations as above but without pretokenized apps:

>
>
SyntaxError: SyntaxError: Got REGEX expected ','
SyntaxError: SyntaxError: Got REGEX expected ','
SyntaxError: SyntaxError: Got REGEX expected ','
>
>
>
>
SyntaxError: SyntaxError: Got REGEX expected ','
Uncaught SyntaxError: Got UNFINISHED REGEX expected EOF
 at line 13 col 24 in antonclk.app.js
var x = g.getWidth() / 2;
                     ^
in function "draw" called from line 43 col 6 in antonclk.app.js
draw();
     ^
>
>
>
>
widmessages.wid.js SyntaxError: SyntaxError: Got EOF expected '}'
widminbate.wid.js SyntaxError: SyntaxError: Got EOF expected '}'
Uncaught SyntaxError: Got UNFINISHED REGEX expected EOF
 at line 13 col 24 in antonclk.app.js
var x = g.getWidth() / 2;
                     ^
in function "draw" called from line 43 col 6 in antonclk.app.js
draw();
     ^
>
>
>
> 

@gfwilliams
Copy link
Member

var x = g.getWidth() / 2;

And that is why we didn't have ) listed I guess! I'll revert this and the EspruinoTools changes then

gfwilliams added a commit to espruino/EspruinoTools that referenced this pull request Jul 10, 2024
gfwilliams added a commit that referenced this pull request Jul 10, 2024
@gfwilliams
Copy link
Member

Thanks for testing and reporting this @thyttan!

Did it actually work for you on a real watch @bobrippling ? It looks like none of the desktop test cases found any issues with this, but I have now just added one

@bobrippling
Copy link
Contributor Author

Thanks for the quick find! I didn't find an issue but I only gave it a cursory check that the interpreter could handle regex.


So to fix this I think we'll need to change how things are done - it's a large change so I'll run it by here to see what you both think first.

I think we should stop lexing a regex and instead decide whether we have one based on the parsing state, so we're not dependant on a look-behind at the previous token. My reasoning here is at a parse level it's relatively trivial to tell if we're going to have a regex or not:

x = a /
//    ^ when the parser is here, we have a bianry-/, so it's obvious it's a division

x = /
//  ^ when the parser is here, we have a unary-/, so it's either a regex or a comment.
// the lexer takes care of comments, so we know it's a regex,
// so we now step through this with the usual regex parsing code

This would mean removing LEX_REGEX (and associated code) and modifying jspeUnaryExpression to handle unary-/ and parse the regex.

What do you think?

@gfwilliams
Copy link
Member

gfwilliams commented Jul 11, 2024

I'm afraid this won't work, at least not without crippling performance - Espruino uses bracket counting to skip over areas of code that it doesn't want to execute. Either when parsing functions, if(false) or even just pasting code into the REPL. For example:

function hello(x) {
  return /}/.match(x);
}
hello("test");

So in your case in order to parse the function we have to be able to run over the entire parse tree each time we looked over any code, because otherwise } is parsed as a token and it thinks the function has ended.

I see two options here really:

  • Add a state machine in the lexer which says if it sees if/while/for followed by ( and then a ) at the same level, then it sees a / it's got a regex. Even this is nontrivial because we'll need to ensure we clone that state when we seek the Lexer to a new position
  • We modify EspruinoTools such that when it minifies and there's a regex right at the start, it brackets it - because it does feel like this kind of pattern doesn't really happen in sane, hand-written code

bobrippling added a commit to bobrippling/BangleApps that referenced this pull request Jul 11, 2024
bobrippling added a commit to bobrippling/BangleApps that referenced this pull request Jul 11, 2024
@bobrippling
Copy link
Contributor Author

Dang, I had wondered what the rationale was behind the lookbehind code. I think the lexer state machine could be good, I'll take a look but it might be a while - I'm away for a bit with it being summer!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants